Goto

Collaborating Authors

 clinical decision


ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment

Li, Ruochen, Li, Jun, Jian, Bailiang, Yuan, Kun, Zhu, Youxiang

arXiv.org Artificial Intelligence

Automatically generated radiology reports often receive high scores from existing evaluation metrics but fail to earn clinicians' trust. This gap reveals fundamental flaws in how current metrics assess the quality of generated reports. We rethink the design and evaluation of these metrics and propose a clinically grounded Meta-Evaluation framework. We define clinically grounded criteria spanning clinical alignment and key metric capabilities, including discrimination, robustness, and monotonicity. Using a fine-grained dataset of ground truth and rewritten report pairs annotated with error types, clinical significance labels, and explanations, we systematically evaluate existing metrics and reveal their limitations in interpreting clinical semantics, such as failing to distinguish clinically significant errors, over-penalizing harmless variations, and lacking consistency across error severity levels. Our framework offers guidance for building more clinically reliable evaluation methods.


Developer Insights into Designing AI-Based Computer Perception Tools

Guhan, Maya, Hurley, Meghan E., Storch, Eric A., Herrington, John, Zampella, Casey, Parish-Morris, Julia, Lázaro-Muñoz, Gabriel, Kostick-Quenet, Kristin

arXiv.org Artificial Intelligence

Artificial intelligence (AI)-based computer perception (CP) technologies use mobile sensors to collect behavioral and physiological data for clinical decision-making. These tools can reshape how clinical knowledge is generated and interpreted. However, effective integration of these tools into clinical workflows depends on how developers balance clinical utility with user acceptability and trustworthiness. Our study presents findings from 20 in-depth interviews with developers of AI-based CP tools. Interviews were transcribed and inductive, thematic analysis was performed to identify 4 key design priorities: 1) to account for context and ensure explainability for both patients and clinicians; 2) align tools with existing clinical workflows; 3) appropriately customize to relevant stakeholders for usability and acceptability; and 4) push the boundaries of innovation while aligning with established paradigms. Our findings highlight that developers view themselves as not merely technical architects but also ethical stewards, designing tools that are both acceptable by users and epistemically responsible (prioritizing objectivity and pushing clinical knowledge forward). We offer the following suggestions to help achieve this balance: documenting how design choices around customization are made, defining limits for customization choices, transparently conveying information about outputs, and investing in user training. Achieving these goals will require interdisciplinary collaboration between developers, clinicians, and ethicists.


Quantifying Symptom Causality in Clinical Decision Making: An Exploration Using CausaLM

Shetty, Mehul, Jordan, Connor

arXiv.org Artificial Intelligence

Current machine learning approaches to medical diagnosis often rely on correlational patterns between symptoms and diseases, risking misdiagnoses when symptoms are ambiguous or common across multiple conditions. In this work, we move beyond correlation to investigate the causal influence of key symptoms-specifically "chest pain" on diagnostic predictions. Leveraging the CausaLM framework, we generate counterfactual text representations in which target concepts are effectively "forgotten" enabling a principled estimation of the causal effect of that concept on a model's predicted disease distribution. By employing Textual Representation-based Average Treatment Effect (TReATE), we quantify how the presence or absence of a symptom shapes the model's diagnostic outcomes, and contrast these findings against correlation-based baselines such as CONEXP. Our results offer deeper insight into the decision-making behavior of clinical NLP models and have the potential to inform more trustworthy, interpretable, and causally-grounded decision support tools in medical practice.


Addressing cognitive bias in medical language models

Schmidgall, Samuel, Harris, Carl, Essien, Ime, Olshvang, Daniel, Rahman, Tawsifur, Kim, Ji Woong, Ziaei, Rojin, Eshraghian, Jason, Abadir, Peter, Chellappa, Rama

arXiv.org Artificial Intelligence

There is increasing interest in the application large language models (LLMs) to the medical field, in part because of their impressive performance on medical exam questions. While promising, exam questions do not reflect the complexity of real patient-doctor interactions. In reality, physicians' decisions are shaped by many complex factors, such as patient compliance, personal experience, ethical beliefs, and cognitive bias. Taking a step toward understanding this, our hypothesis posits that when LLMs are confronted with clinical questions containing cognitive biases, they will yield significantly less accurate responses compared to the same questions presented without such biases. In this study, we developed BiasMedQA, a benchmark for evaluating cognitive biases in LLMs applied to medical tasks. Using BiasMedQA we evaluated six LLMs, namely GPT-4, Mixtral-8x70B, GPT-3.5, PaLM-2, Llama 2 70B-chat, and the medically specialized PMC Llama 13B. We tested these models on 1,273 questions from the US Medical Licensing Exam (USMLE) Steps 1, 2, and 3, modified to replicate common clinically-relevant cognitive biases. Our analysis revealed varying effects for biases on these LLMs, with GPT-4 standing out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which were disproportionately affected by cognitive bias. Our findings highlight the critical need for bias mitigation in the development of medical LLMs, pointing towards safer and more reliable applications in healthcare.


AI in the hands of imperfect users

#artificialintelligence

As the use of artificial intelligence and machine learning (AI/ML) continues to expand in healthcare, much attention has been given to mitigating bias in algorithms to ensure they are employed fairly and transparently. Less attention has fallen to addressing potential bias among AI/ML’s human users or factors that influence user reliance. We argue for a systematic approach to identifying the existence and impacts of user biases while using AI/ML tools and call for the development of embedded interface design features, drawing on insights from decision science and behavioral economics, to nudge users towards more critical and reflective decision making using AI/ML.


Deep-learning system identifies difficult-to-detect brain metastases – Physics World

#artificialintelligence

Researchers at Duke University Medical Center have developed a deep-learning-based computer-aided detection (CAD) system to identify difficult-to-detect brain metastases on MR images. The algorithm exhibited excellent sensitivity and specificity, outperforming other CAD systems in development. The tool shows potential to enable earlier identification of emerging brain metastases, allowing them to be targeted with stereotactic radiosurgery (SRS) when they first appear and, for some patients, reducing the number of required treatments. SRS, which uses precisely focused photon beams to deliver a high dose of radiation to targets in the brain in a single radiotherapy session, is evolving into the standard-of-care treatment for patients with a limited number of brain metastases. To target a metastasis, however, it must first be identified on an MR image.


How cutting-edge AI technology is improving surgical precision

#artificialintelligence

Artificial intelligence (AI) is improving surgical planning, guidance and review, says Paul Mussenden, Chief Executive Officer, Cydar Medical. It's operating in all areas of healthcare and helping join up the different stages of the care pathway. That's because AI is very good at rationalising lots of complex data in a broad range of areas such as imaging data, diagnostic data, clinical data and genetic data -- and using it to personalise healthcare for individual patients. It gives clinicians the best information and new insights to make better decisions. Over the last 15 years, there has been a big shift to minimally invasive procedures.


Fragmented CDS Tech Poses Problems for Healthcare Data Interoperabilit

#artificialintelligence

The earliest clinical decision support systems date back to the 1960s, when pharmacists used automated technology to check patient allergies, research dosages, and check for drug-to-drug interactions.(1) Now, according to recent estimates, up to 74 percent of healthcare provider organizations use clinical decision support (CDS) technology.(2) These systems harness the power of artificial intelligence (AI) to help provide clinicians, staff members, patients, and others with person-specific health information. In New Jersey, a CDS system known as Clover Assistant is taking hold as an invaluable resource for physicians--the platform provides clinicians with patient-specific information that is relevant to the visit, as well as providing actionable insights to help improve long-term outcomes and guide preventative care.(3) But there is still the problem of fragmentation. Stuart Long, CEO of InfoBionic, a leading digital cardiac health company, says, "CDS systems are great for helping physicians arrive at appropriate and timely clinical decisions regarding many aspects of patient care.


Study: AI can make better clinical decisions than humans

#artificialintelligence

But what if that second opinion could be generated by a computer, using artificial intelligence? Would it come up with better treatment recommendations than your professional proposes? A pair of Canadian mental-health researchers believe it can. In a study published in the Journal of Applied Behavior Analysis, Marc Lanovaz of Université de Montréal and Kieva Hranchuk of St. Lawrence College, in Ontario, make a case for using AI in treating behavioral problems. "Medical and educational professionals frequently disagree on the effectiveness of behavioral interventions, which may cause people to receive inadequate treatment," said Lanovaz, an associate professor who heads the Applied Behavioral Research Lab at UdeM's School of Psychoeducation.

  Country:
  Genre: Research Report (0.96)

3 Questions: Artificial intelligence for health care equity

#artificialintelligence

The potential of artificial intelligence to bring equity in health care has spurred significant research efforts. Racial, gender, and socioeconomic disparities have traditionally afflicted health care systems in ways that are difficult to detect and quantify. New AI technologies, however, are providing a platform for change. Regina Barzilay, the School of Engineering Distinguished Professor of AI and Health and faculty co-lead of AI for the MIT Jameel Clinic; Fotini Christia, professor of political science and director of the MIT Sociotechnical Systems Research Center; and Collin Stultz, professor of electrical engineering and computer science and a cardiologist at Massachusetts General Hospital -- discuss here the role of AI in equitable health care, current solutions, and policy implications. The three are co-chairs of the AI for Healthcare Equity Conference, taking place April 12. Q: How can AI help address racial, gender, and socioeconomic disparities in health-care systems?